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S This work aims at performing Functional Principal Components Analysis (FPCA) 

h— ^ with Horvitz- Thompson estimators when the observations are curves collected with 

/^ survey sampling techniques. One important motivation for this study is that FPCA is 

^vQ a dimension reduction tool which is the first step to develop model assisted approaches 

that can take auxiliary information into account. FPCA relies on the estimation of the 

eigenelements of the covariance operator which can be seen as nonlinear functionals. 

Adapting to our functional context the linearization technique based on the influence 

function developed by Deville (1999), we prove that these estimators are asymptotically 

'^^ design unbiased and consistent. Under mild assumptions, asymptotic variances are 

Cd derived for the FPCA' estimators and consistent estimators of them are proposed. Our 

S approach is illustrated with a simulation study and we check the good properties of the 

1—^ proposed estimators of the eigenelements as well as their variance estimators obtained 

^ with the linearization approach. 

> 

Qs Keywords : covariance operator, eigenfunctions, Horvitz-Thompson estimator, influence 

^O function, model assisted estimation, perturbation theory, survey sampling, variance estima- 

^J tion, von Mises expansion. 

O 1 Introduction and notations 

00 

• • Functional Data Analysis whose main purpose is to provide tools for describing and mod- 

. I^ eling sets of curves is a topic of growing interest in the statistical community The books 

/\ by Ramsay and Silverman (2002, 2005) propose an interesting description of the available 

3 procedures dealing with functional observations whereas Ferraty and Vieu (2006) present a 

completely nonparametric point of view. These functional approaches mainly rely on gener- 
alizing multivariate statistical procedures in functional spaces and have been proved useful 
in various domains such as chemometrics (Hastie and Mallows, 1993), economy (Kneip and 
Utikal, 2001), climatology (Besse et al. 2000), biology (Kirkpatrick and Heckman, 1989, 
Chiou et al. 2003) or remote sensing (Cardot et al., 2003). These functional approaches 
are generally more appropriate than longitudinal data models or time series analysis when 
there are for each curve many measurement points (Rice, 2004). 



When dealing with functional data, the statistician generally wants, in a first step, to 
represent as well as possible the sample of curves in a well chosen small dimension space 
in order to get a description of the functional data that allows interpretation. This objec- 
tive can be achieved by performing a Functional Principal Components Analysis (FPCA) 
which provides a small dimension space able to capture, in an optimal way according to a 
variance criterion, the main modes of variability of the data. These modes of variability are 
given by considering, once the mean function has been subtracted off, projections onto the 
space generated by the eigenfunctions of the covariance operator associated to the largest 
eigenvalues. This technique is also known as the Karhunen-Loeve expansion in probability 
or Empirical Orthogonal Functions (EOF) in climatology and numerous works have been 
published on this topic. From a statistical perspective, the seminal paper by Deville (1974) 
introduces the functional framework whereas Dauxois et al. (1982) give asymptotic distri- 
butions. More recent works deal with smoothing or interpolation procedures (Castro et al, 
1986, Besse and Ramsay, 1986, Cardot, 2000 or Benko et al. 2009) as well as bootstrap 
properties (Kneip and Utikal, 2001) or sparse data (James et al., 2000). 

The way data are collected is seldom taken into account in the literature and one gen- 
erally supposes the data are independent realizations drawn from a common functional 
probability distribution. Even if this assumption can be supposed to be satisfied in most 
situations, there are some cases for which it will lead to estimation procedures that are 
not adapted to the sampling scheme. Design of experiments approaches have been studied 
by Cuevas et al. (2003) but nothing has been done in the functional framework, as far 
as we know, from a survey sampling point of view whereas it can have some interest for 
practical applications. For instance, Dessertaine (2006) considers the estimation with time 
series procedures of electricity demand at fine time scales with the observation of individual 
electricity consumption curves. In this study, the data are functions of time measured every 
ten minutes with more than 1000 time point observations and can be naturally thought as 
functional data. Moreover, the individuals {e.g. electricity meters) are selected according to 
balancing techniques (Deville and Tille, 2004) and consequently they do not have the same 
probability to belong to the sample. More generally, there are now data (data streams) pro- 
duced automatically by large numbers of distributed sensors which generate huge amounts 
of data that can be seen as functional. The use of sampling techniques to collect them 
proposed for instance in Chiky and Hebrail (2009) seems to be a relevant approach in such 
a framework allowing a trade off between limited storage capacities and accuracy of the 
data. In such situations classical estimation procedures will lead to misleading interpreta- 
tion of the FPCA since the mean and covariance structure of the data will not be estimated 
properly. 

We propose in this work estimators of the FPCA when the curves are collected with 
survey sampling strategies. Let us note that Skinner et al. (1986) have studied some 
properties of multivariate PCA in such a survey framework. Unfortunately, this work has 
received little attention in the statistical community. The functional framework is different 
since the eigenfunctions which exibit the main modes of variability of the data are also 
functions of time and can be naturally interpreted as modes of variability varying along 
time. FPCA can also be, by its dimension reduction properties, a useful tool if one wants to 
use model-assisted approache (Sarndal et al., 1992) that can take auxiliary information into 
account. Adapting for instance the single index model (Chiou et al. 2003) or the additive 
model (Miiller and Fang, 2008) on the principal components scores in this survey context 
would allow us to consider model assisted and small domain estimation in a functional 
context. 



The paper is structured as follows. We first define, in section 2, functional principal com- 
ponents analysis in a finite population setting. Then we propose estimators of the mean 
function and the covariance operator based on Horvitz-Thompson estimators. We also 
describe how this dimension reduction tool can be of great interest for model assisted and 
small domain estimation when auxiliary information is available. Section 3 is devoted to the 
asymptotic properties. We show in section 3.1 that the FPCA' estimators are asymptotically 
design unbiased and consistent. Section 3.2 provides approximations and consistent estima- 
tors of the variances of FPCA' estimators with the help of perturbation theory (Kato, 1966) 
and the influence function (Deville, 1999). Campbell (1980) proposed, in a pioneer work, 
to use the influence function for estimating the variance of complex statistics and compared 
it with a jackknife variance estimator (see also Berger and Skinner, 2005). In such a func- 
tional context, we can not perform a first-order Taylor expansion of the associated complex 
statistics but we can make a first-order von Mises (1947) expansion of the functional giving 
these complex statistics and obtain under broad assumptions that the asymptotic variance 
of the complex statistics is equal to the variance of the Horvitz-Thompson estimator for 
the population total of some artificial variable u^ constructed using the influence function 
technique (Deville, 1999). A jackknife variance estimator can be obtained by analogy with 
the Deville's linearization variance in which the analytic expression of Uk is replaced by 
its numerical approximation (Davison and Hinkley, 1997). Section 4 proposes a simulation 
study which shows the good behavior of our estimators for various sampling schemes as well 
as the ability of linearization techniques to give good approximations to their theoretical 
variances. The proofs are gathered in an Appendix. 

2 Survey framework and PC A 

2.1 FPCA in a finite population setting 

Let us consider again the example of the estimation of the electricity demand presented in 
the introduction. If measures are taken every ten minutes during 24 hours, the consumption 
curve for one household k belonging to the population is represented by the functional Y^ (t) 
with t being one of the 144 time measurements. In such a situation it is more convenient 
to consider that the observed trajectories are functions, instead of vectors of size 144, 
belonging to a function space that we suppose, from now on and without loss of generality, 
to be -L^[0, 1], the space of square integrable functions defined on the closed interval [0, 1]. 
This space is equipped with the its inner product (•, •) and norm || • ||. 

Let us consider a finite population U = {1, . . . , fe, . . . , A^} with size N, not necessarily 
known, and a functional variable 3^ defined for each element k of the population f7: Y^ = 
{Yk{t))t(z!o^i] belongs to the space L'^[0, 1]. Suppose first that we are looking for the function 
/i G -L^[0, 1] which is the closest to the population curves according to a quadratic loss 
criterion. The criterion J2k&u 11-^*: ~ '^o|| is clearly minimum for 4)0 = ji Xlfcet/ ^k, which 
is the mean population curve : 

keu 

The curves Y^ span a subspace of L^[0, 1] whose dimension can be very large, at most N. 
Going further, we would like now to obtain a subspace of L'^[0, 1] with dimension q, as small 
as possible, that would allow to represent as well as possible the deviation of the population 
curves from their mean function //. Considering an orthonormal basis (j)i}4'2, ■ ■ ■ ,4'q of this 



q dimensional space, it is well known that the projection Pq of the deviation of the Yj. from 
their mean function /i can be expressed as follows 



P,{yk-^J) 
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(Yfc - ^, (j)^ 



,Yj/Yj. 



Considering again a quadratic loss criterion, we would like to minimize the following quantity 
according to the set of orthonormal functions (pi,(p2, ■ ■ ■ ,<Pqj 



Rn 



1 



n,(P2,---,(Pq) 



N 

AT 2^ 
fc=l 



(n - M) 




/i,0j) 



(2) 



To get the solution of this optimization problem, we need to introduce more notations. Let 
us define the covariance operator, say F, of the functions Yfc, /;: € C/, as follows 
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{Yk - /i) {Yk - fi) 



(3) 



where the tensor product of two elements a and b of L^[0, 1] is the rank one operator such 
that a b{u) = {a,u)b for all u in L'^[0, 1]. The operator T is symmetric and non negative 
{{Tu,u) > 0). Its eigenvalues, which are positive and supposed to be sorted in decreasing 
order Ai > A2 > • • • > Xn > 0, satisfy 



Tv^it) = XjVjit), tG [0,1], i = l. 



,N, 



(4) 



i.e 



where the eigenfunctions Vj,j = l,...,g, form an orthonormal system in L^[0,1], 
{vj,Vj') = 1 if J = j' and zero otherwise. Going back to criterion ([2|), one can express 
it as follows 
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(5) 



and we get by maximal properties of the eigenvalues (see Chatelin, 1983) that the minimum 



of Ra 



bq) is attained for 



fi,, 



■ ■ I'+'q 



Thus the optimal subspace of 



dimension q, which is unique if \q > Ag+i, is the space generated by the q eigenfunctions of 
r associated to the q largest eigenvalues. Having these considerations in mind, we can build 
an expansion, which is similar to the Karhunen-Loeve expansion or FPCA, that allows to 
get the best approximation in a finite dimension space with dimension q to the curves of 
the population making the key decomposition 



Yk{t) 



q 

E 



^(t) + V(yfc - /i, Vj)Vj{t) + Rq^kit), t G [0, 1] 



(6) 



where Rq^k{t) is the remainder term. This means that the space generated by the eigen- 
functions vi,- ■ ■ ,Vq gives a representation of the main modes of variation along time t of 
the data around the mean ^i. Moreover, the variance of the projection onto each Vj is given 
by the eigenvalue 



A,- 



1 






N ^-^ 

k& 



{Yk- ^i.Vjf 



since ^ Ekeui^k - ^, Vj) = 0. 

We aim, in the following, at estimating the mean function fj, and the covariance operator 
r in order to deduce estimators of the eigenelements {\j,Vj) when the data are obtained 
with survey sampling procedures. To this purpose, we express our parameters of interest 
as non-linear functions of finite population totals. Next, we substitute each total with its 
Horvitz-Thompson estimator described in the next section and finally, we obtain in section 
^the asymptotic variance adapting the influence function approach (Deville, 1999). 
Remark. In the space -L^[0, 1], we have in an equivalent way the following representation 
of the covariance operator 

Tu{t) = / j{s,t)u{s) ds (7) 

Jo 

where j{s, t) is the covariance function 

^(«'*) = ^E(^fc(*)-^(*))(^^-(«)-^("))' (5,*) e [0,1] X [0,1]. (8) 

Note also that if H = W then we get hack to the classical definition of the principal com- 
ponents, the covariance operator being then the variance-covariance matrix, with size px p, 
of the population vectors. 

2.2 The Horvitz-Thompson Estimator 

Let us consider a sample s of n individuals, i.e. a subset s C U, selected according to a 
probabilistic procedure p{s) where p is a probability distribution on the set of 2^ subsets of 
U. We denote by vr^ = Pr(fc G s) for all k € U the first order inclusion probabilities and by 
TTki = Pi(k Sz I £ s) for all k,l gU with vr^^ = vr/;, the second order inclusion probabilities. 
We suppose that all the individuals and all the pairs of individuals of the population have 
non null probabilities to be selected in the sample s, namely vr^ > and TTki > 0. We also 
suppose that vr^ and iTki are not depending on t G [0, 1]. This means that once we have 
selected the sample s of individuals, we observe Yk{t) for all t £ [0, 1] and all k (z s. Let us 
start with the simplest case, the estimation of the finite population total of the Y^ curves 
denoted by 

tY = Y.Yk. 

keu 

The Horvitz-Thompson (HT) estimator tyn of ty is a function belonging to L^[0, 1] defined 
as follows 

kes ^ k(^U 

where Ik = Iskes} is the sample membership indicator of element k (Sarndal et al., 1992). 
Note that the variables /^ are random with Pr{Ik = 1) = vr^ whereas the curves Y^ are 
considered as fixed with respect to the sampling design p{s). So, the HT estimator ty-,^ is 
p- unbiased, namely 

Ep{tyT,) = ty 

where Ep{-) is the expectation with respect to the sampling design. 



The variance operator of tyn calculated with respect to p{s) is the HT variance 

Vp(?y.) = VVAM^®^ (9) 

and it is estimated p-unbiasedly by V„(ty7r) = > } — with the notation A^i = 

^^ -^kl TTfc TTl 

T^kl—T^kT^l ii k ^ I and A^^ = 7rfc(l — VTfc). One mayobtain equivalent integral representations 
of Vp(tY,r) and Vp(ty^) similar as in equations (M) and te|. 

Example: Let us select a sample of n curves Y^ according to a simple random sample 
without replacement (SI) from U. We have ty-w = (N/n) "^f^^g Y^ with variance VsiCtYw) = 
^'^~W~^YU /''^ / ~ JT-/-^ and Syu = jfzi X^t/(^fc— /w)®(^A;— ^) the population variance. The 
variance estimator is given by VsiiiYw) = ^"^^^^Ys ™^^ ^Ys ~ ^tv-i '^Z sO^k— lJ's)®{Yk— l^s) 
and Us = ^J2s^k- 

2.3 Substitution Estimator for Nonlinear Parameters 

Consider now the estimation of one of the following parameters: ^, T and the eigenelements 
\j and Vj given by ([I|, ([3| and Q. When the population size is unknown, we deal with 
nonlinear functions of population totals. To estimate these parameters, we substitute each 
total by its Horvitz- Thompson estimator as described in the above section. We obtain 
complex statistics whose variances are no longer calculated using formula ([9|. Besides the 
nonlinearity feature, we have to cope now with the fact that 3^ is a functional variable which 
makes the variance estimation issue more difficult. In order to overcome this, we adapt the 
linearization technique based on the influence function introduced by Deville (1999) to the 
functional framework. This approach is based on the fact that each finite population total 
may be written as a functional depending on a finite and discrete measure M and as a 
consequence, the population parameter of interest can be written as a functional T(M). 
We derive the Horvitz-Thompson estimator M of M and estimators or our parameters are 
obtained by pluging-in M in the expression of T, namely T{M). 
Let us introduce now the discrete measure M defined on L^[0, 1] as follows 

M = Y.6y, 
k&U 

where Jy^. is the Dirac function taking value 1 if y = 1^ and zero otherwise. The following 
parameters of interest can be defined as functionals of M: 

^ fydM 

N = dM and fi = ^ 

•^ dM 

{y-fi)'E){y- fi) dM 

dM 

and the eigenelements, given by (El), are implicit functionals T of M. 

The measure M is estimated by the Horvitz-Thompson estimator M associating the weight 



l/vTfc for each Y^ with k £ s and zero otherwise, 

and T{M) is then estimated by T{M) also called the substitution estimator. For example, 
the substitution estimators for // and T are 






^V— (10 

fees 

AT ^ ' TTi^ 



where the size A^ of the population is estimated hy N = \ — . Then estimators of the 

kds 

eigenfunctions {vj,j = 1, . . .q} associated to the q largest eigenvalues {Xj,j = I, . . .q} are 
obtained readily by the eigen-analysis of the estimated covariance operator T. 

Remark. In practice we do not observe the whole curves but generally discretized 
versions at m design points < ti < t2 < ■ ■ ■ < t^ < 1 that we suppose to be the 
same for all the curves. Quadradure rules are often employed in order to get numerical 
approximations to integrals and inner product by summations : for each u in L^[0, 1] we get 
an accurate discrete approximation to the integral 



/ u{t)dt ss >,^^ u{t() 
Jo 7^ 



'0 e=i 

provided the number of design points p is large enough and the grid is sufficiently fine. 
When the discretization points vary from one curve to another basis functions approaches are 
generally employed in order to smooth and to decompose the signals in a common functional 
space (see e.g. Ramsay and Silverman, 2005). 

2.4 Some comments on the interest of FPCA in survey sampling 

As seen in equation (Ig]), the FPCA allows to get a finite and generally small dimension space 
that is able to reconstruct rather well the curves of the population. Indeed the principal 
components scores (Yfc — fi,Vj), for j = 1, . . . ,q, are indicators of the deviation of curve Yk 
from its mean function ^. When auxiliary variables that influence significantly the shape 
of the population curves are known for each element of the population it would certainly 
be of great interest to consider models that could explain the individual fluctuations of the 
principal components scores. This could be useful for instance for improving the total curve 
estimation and small domain estimation. Indeed, suppose we have a set of p real covariates, 
xi, . . . ,Xp and a function fj (to be estimated) such that 

£,- {Yk- fi, Vj) = fjixki, ..., Xkp) + ejk 

where Cjk is supposed to be a random noise for k £ U and j = 1, . . . ,q. Then, having built 
estimators fj of the functions fj using the model ^ and the sampling design p(-), the total 
curve could be estimated considering a model assisted approach 

?Ht) = E^-fe^-EM' *^[°'']- 

fees ^ Vfces ^ u / 



where the predicted Vs values are given by 

q ^ 

^fc(i) = Kt) + X] fj(^kl, • • • , Xkp)Vj{t). 

Chiou et al. (2003) proposed nonparametric estimators of function fj based on single index 
models that could explain the principal components scores thanks to real covariates whereas 
Miiller and Yao (2008) consider additive models. 

Going back to the motivating example of individual electricity consumption curves, 
it is clear that the temperature, the past consumption, or the surface of the household 
can be made available for the all population and are certainly correlated with the shape 
of individual curves. Thus building statistical models that explain the variations of the 
principal components scores should be helpful to provide better estimators of the total 
consumption curve as well as estimation of total curves for small domains. This issue which 
is according to us of great interest deserves further investigations that are beyond the scope 
of this paper. 

3 Asymptotic Properties 

We give in this section asymptotic properties of our estimators fi, T and Xj,Vj. Nevertheless, 
the approach we propose in the following is general and can be useful for estimating other 
non-linear functions of totals. 

Let us consider the superpopulation asymptotic framework introduced by Isaki and Fuller 
(1982) which supposes that the population and the sample sizes tend to infinity. Let Ufq 
be a population with infinite (denumerable) number of individuals and consider a sequence 
of nested sub populations such that Ui C ■ ■ ■ C Uu-i C U^ C Uu+i C • • • C f^ of sizes 
-^1 < -^2 < • • • < ]^u < . • • . Consider then a sequence of samples Si, of size riu drawn from 
Uu according to the fixed-size sampling designs Puis^) and denote by nku and TTkiu their first 
and second order inclusion probabilities . Note that the sequence of sub populations is an 
increasing nested one while the sample sequence is not. For sake of simplicity, we will drop 
the subscript v in the following. 

We assume that the following assumptions are satisfied : 

(Al) sup||Yfc|| < C < cx), 

k&U 
(A2) lim =vrG(0,l), 

(A3) min vr^ > A > , minvrfc; > A* > and limN^ooiT-maxlTTki — iTkiril < cxd. 

Hypothesis (Al) is rather classical in functional data analysis. Note that it does not imply 
that the curves lfc(*) are uniformly bounded in k and t G [0, 1]. Hypotheses (A2) and (A3) 
are checked for usual sampling plans (Robinson and Sarndal, 1983, Breidt and Opsomer, 
2000). 

3.1 ADU-ness and Consistency of Estimators 



The substitution estimators of n and F defined in (10) and (11), as well as Xj and Vj, are 



no longer p-unbiased. Nevertheless, we show in the next that, in large samples, they are 



asymptotically design unbiased (ADU) and consistent. 

An estimator <I> of <I> is said to be asymptotically design unbiased (ADU) if 



lim 



^p($)-$ =0 



We say that ^ satisfies ( <J> — <J> I = Op{un) for a sequence Un of positive numbers if there is 



a constant C such that for any e > 0, Pr 



<J._<J. 



> Cun I < £• The estimator is consistent 



if one can find a sequence Un tending to zero as n tends to infinity such as <J> — <I> = Op{un)- 
Let us also introduce the Hilbert-Schmidt norm, denoted by ||-||2 for operators mapping 
L^[0, 1] to L^[0, 1]. It is induced by the inner product between two operators T and A defined 
by (r, A)2 = YlT^i^^i-' ^^e) ^^ ^^y orthonormal basis {ee)i>i of -L^[0, 1]. In particular, we 
have that ||r||2 = I]£i(re^re^) = Y.j>i ^]- 

Proposition 3.1 Under hypotheses (Al), (A2) and (A3), 

2 

= 0{n-^), 

= 0{n-^), 
= 0{n-^). 
If we suppose that the non null eigenvalues are distinct, we also have, 

\ 2 





A, - A, 



0{n 



-1^ 



and for each fixed j, 

|2 



Ep \\vj 



Vi 



0{n-^] 



As a consequence, the above estimators are ADU and consistent. 
The proof is given in the Appendix. 



3.2 Variance Approximation and Estimation 

Let us now define, when it exists, the influence function of a functional T at point y £ 
L2[0, 1] say IT{M,y), as follows 



ITiM,y) 



lim 



T{M + h6y) - T{M) 
h 



where 6y is the Dirac function at y. Note that this is not exactly the usual definition of the 
influence function (see e.g. Hampel, 1974 or Serfling, 1980) and it has been adapted to the 
survey sampling framework by Deville (1999). We define the linearized variables Uk, k £ U 
as the influence function of T at M and y = Yk, namely 



Uk 



ITiM,Yk). 



Note that the linearized variables depend on Yk for all k £ U and as a consequence, they 
are all unknown. 



We can give a first order von Mises expansion of our functional T, 

T{M) = T{M) + ^IT{M,Yk)(—-l]+RT (12) 

k€U ^^^ ^ 

= T{M) + y^uJ—-l]+RT 

for the reminder Rt and the hnearized variable Uk = IT(M,Yk). The above expansion tells 
us, under regularity conditions, that the asymptotic variance of the estimator T{M) is the 
variance of the HT estimator of the population total of IT (Af , Y^) provided the remainder 
term Rt is negligible. 

Before handling the remainder term, let us first calculate the influence function for our 
parameters of interest. 

Proposition 3.2 Under assumption (Al), we get that the influence functions of ^ and T 
exist and 

If,{M,Yk) = ^(n-/x) (13) 

IT{M,Yk) = l^{{Yk-fi)^{Yk-fi)-T). 

If moreover, the non null eigenvalues of T are distinct, then 

IXj{M,Yk) = l^{{Yk-f^,v,f-Xj) (14) 

(Yfc - fJ.,Vj){Yk - n,Vi) 







iV \ f^ Xj- Xi 



The proof is given in the Appendix. Let us remark that the influence functions of the 
eigenelements are similar to those found in the multivariate framework for classical PCA 
(Croux and Ruiz-Gazen, 2005). 



We are now able to state that the remainder term Rt defined in equation (13) is negligible 
and that the linearization approach can be used to get the asymptotic variance of our 
substitution estimators. 
Let us suppose the supplementary assumption: 

(A4) : The Horvitz-Thompson estimator Y2s ~ satisfies a Central Limit Theorem for the 



S TTfc 



linearized variables Uk given by proposition 3.2 



This assumption is satisfied for classical sampling designs and real quantities u^ (see e.g 
Chen and Rao, 2007, and references therein). The case of functional quantities deserves 
further investigations. Preliminary results can be found in Cardot and Josserand (2009). 

Proposition 3.3 Suppose the hypotheses (Al), (A2) and (A3) are true. Consider the 
functional T giving the parameters of interest defined in pp, (^ and Q). We suppose that 
the non null eigenvalues are distinct. Then Rt = Op{n~'^'^ and 

T(M) - T{M) = Y^uJ--l)+ Op(n-i/2) 

k&U ^^^ ' 
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where the u^ are linearized variable ofT calculated in proposition 3.2. 



If (A4) is also true, the asymptotic variance of Jl, resp. ofvj, is equal to the variance 



operator of the HT estimator > — with u^ given by (13), resp. by (15), and its expression 
is given by 



AV,iTiM)) = Y,T.^ 



kl- 



Uk ^ Ul 



u u 
The asymptotic variance of Xj is 

UkUl 

u u 



TTfe VT; 



^m) = EE^ 



kl- 



T^kT^l 



(16) 



(17) 



with Ufc given by (14 ) 



The proof is given in the Appendix. As one can notice, the asymptotic variances given in 



Proposition 3.3 are unknown since the double sums are considered on the whole population 
U and we have only a subset of it and secondly, the linearized variables Uk are not known. 



As a consequence, we propose to estimate (16) and (17) by the HT variance estimators 



replacing the linearized variables by their estimations. In the case of /I, Xj and Vj, we 
obtain the following variance estimators: 
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T^kf. TTfeVr^ 



with Ivj{Ad,Yi] 



m 



Jf^j 



{Yk -'jl,Vj){Yk -/2 , ve) . 
Xj — Xe 



-Vi 



In order to prove that these variance estimators are consistent we need to introduce 
additional assumptions involving higher order inclusion probabilities. 

(A5) : Denote by Dt^N the set of all distinct t tuples {11,12, ■■■ ,it) from U. We suppose 
that 



lim n^ max \Ep[{Ii^ - ■jri^){Ii^ - Tri.^)^!^ - '^i3)i^ii - '^iM < 00 



lim max \Ep[{IiJi^ - TTi^i^){IiJi^ - Tri,^i^)]\ = 

TV -♦00 («l,«2,«3,«4)e-D4,]V 

lim sup n max \Ep [(/j^ - 7rjJ^(/j2 - 7rJ2)(/j3 - iTi^)] | < 
N^oo {n,«2,*3)e-D3,jv 







CXD 



Hypothesis (A5) is a technical assumption that is similar to assumption A7 in Breidt and 
Opsomer (2000). These authors explain in an interesting discussion that this set of assump- 
tions holds for instance for simple random sampling without replacement (SRSWR) and 
stratified sampling. 
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Proposition 3.4 Under assumptions (Al)-(A5), we have that 

1' 



Ep 



Ep 



AVpifi) - Vpifi) 
AVp(x,) - Vp(x,) 



o 

n 

1 
o 

n 



If moreover T is a finite rank operator whose rank does not depend on N then 

2 ^ In, 



AVp{vj) - Vp{vj) 
for j = l,...,g. 



The proof is given in the Appendix. This theorem impUes that variance estimators for the 
mean function, the eigenvalues and the first q eigenfunctions are asymptotically design unbi- 
ased and consistent. Note that the hypothesis that T is a finite rank operator is a technical 
assumption that is needed in the proof for the eigenfunctions in order to counterbalance 
the fact that eigenfunction estimators are getting poorer as j increases. Note that with 
finite populations, operator T is always a finite rank operator and its rank is at most A^, 
the population size. We probably could assume, at the expense of more complicated proofs, 
that the rank of T tends to infinity as N increases. Allowing then q to tend to infinity with 
the sample size with a rate depending on the shape of the eigenvalues should lead to the 
same variance approximation results for the eigenvectors. 

4 A simulation study 

We check now with a simulation study that we get accurate estimations to the eigenelements 
even for moderate sample sizes as well as good approximation to their variance for simple 
random sampling without replacement (SRSWR) and stratified sampling. In our simula- 
tions all functional variables are discretized in m = 100 equispaced points in the interval 
[0, 1]. Riemann approximations to the integrals are employed to deal with the discretization 
effects. 

We consider a random variable Y following a Brownian motion with mean function 
/x(t) = cos(47rt),t G [0,1] and covariance function cov{s,t) = min(s,i). We make N = 
10000 replications of Y. We construct then two strata Ui and U2 of different variances 
by multiplying the A^i = 7000 first replications of y by ai = 2 and the N2 = 3000 other 
replications by (T2 = 4. Our population U is the union of these two strata. 

To evaluate our estimation procedures we make 500 replications of the following experi- 
ment. We draw samples according to two different sampling designs (SRSWR and stratified) 
and consider two different sample sizes n = 100 and n = 1000. Each stratified sample is 
built by drawing independently two SRSWR of sizes rii in stata Ui and n2 = n — ni in 
strata C/2. The sample sizes are chosen to take into account the different variances in the 
strata: 

Hi _ Ni (Tl 712 _ N2 (J2 

~ ~ ~J^ Niai+N2CT2 ' ^ ~ ~J^ N1CT1+N2CT2 

in analogy with univariate stratified sampling with optimal allocation (Sarndal et al, 1992). 
A stratified sample s of size n = 100 trajectories is drawn in Figure [T] 
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Estimation errors for the first eigenvalue and the first eigenvector are evaluated by 
considering the following loss criterions ^^ ^ and ^^, !!^ (Euclidean norm) among our 
500 replications of the experiments. The approximations turn out to be effective as seen in 
Figure |2] For example for both sampling strategies the first eigenvector approximation has 
a median error lower than 3% for a sample size n = 1000. It also appears that the stratified 
sampling gives better estimations than the SRSWR sampling. 

Let us look now at the variance of our estimators. Tables [T] and |2] give three variance 
(resp. euclidean norm of variance) approximations to the estimator of respectively the first 
eigenvalue and the first eigenvector. The first variance approximation to these estimators 
is their empirical variance and are denoted by Var(Xi) and by Var{vi) , the second one 
is the asymptotic variance denoted by AV{\i) and by AV{vi) whereas the third one is a 
[25%, 75%] confidence interval obtained by estimating the asymptotic variance using the 

HT variance estimator respectively denoted by T^ ( Ai j and Vp{vi) . Errors (see Figure 3) 

inearization approach are evaluated 

A \\Var(v\)-Vp(vr)\\ 
||yar{?;i)|| 



in approximating the variance of the estimators by the 

by considering the following criterions: — — - 

As a conclusion, we first note with t 



Var{\i) 

lis simulation study that HT estimators of the co- 
variance structure of functional observations are accurate enough to derive good estimators 
of the FPCA. Secondly, linear approximations by the influence function give reasonable 
estimation of the variance of the eigenelements for small sample sizes and accurate estima- 
tions as far as n gets larger (n=1000). We also notice that the variance of the estimators 
obtained by stratified sampling turns out to be smaller than with SRSWR sampling. 





n=100 


n= 


1000 




SRSWR 


stratified 


SRSWR 


stratified 


Var{Xi) 


0.314 


0.223 


0.0317 


0.0189 


AV{Xi) 


0.340 


0.209 


0.0309 


0.0183 


% (Ai) 


[0.208;0.430] 


[0.155;0.257] 


[0.027;0.034] 


[0.0169;0.0195] 



Table 1: Variance approximation of the first eigenvalue estimator. 





n=100 


n=1000 




SRSWR 


stratified 


SRSWR 


stratified 


\\Var{vi)\\ 


0.450 


0.286 


0.0396 


0.0265 


\\AV{v,)\\ 


0.3997 


0.287 


0.0386 


0.0267 


ll^p(^i)ll 


[0.335;0.491] 


[0.252;0.354] 


[0.0371;0.0410] 


[0.0256;0.0280] 



Table 2: Norm of the variance approximation of the first eigenvector estimator. 
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Figure 1: A stratified sample of n = 100 curves 
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Figure 2: Estimation errors for two different sampling strategies (SRSWR and stratified 
sampling). First eigenvalue with n = 100. (a) and n = 1000 (b). First eigenvector with 
n = 100. (c) and n = 1000 (d). 
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Figure 3: Estimation errors in the variance approximation for two different sampling strate- 
gies (SRSWR and stratified sampling). First eigenvalue with n = 100. (a) and n = 1000 
(b). First eigenvector with n = 100. (c) and n = 1000 (d). 



16 



Appendix : proofs 



Proof of proposition |3.1 



Let us introduce ak = -^ — 1, we have 



N -N 



N 



fcec/ 



Noting that with assumptions (A2) and (A3), E{a1) = (1— vr/s)/?!^ < (1— 7rfc)/A, \E{akai)\ = 
\Ake/{TTk'^i)\ < |Afc^|/A^, and taking now the expectation, according to the samphng distri- 
bution p, we get 



E„ 



N -N 

N 



m 



y^ Ep{aiak) 



k/eu 



iV2 1 ^ TTk 



E^ + EE 



A 



ke 



keu " '-'" " "- '^ ■ 

1 fN iV(iV-l)nmax|AH| 



n 



A2 






(18) 



which is the first result. Looking now at the estimator of the mean function, we have 



1 "ST- .^ N-N 

- 2^ ttkYk + I — ^^— I /x 



A^ 



fcec/ 



A 



AT ft 



By assumptions (Al)-(A3) it is clear that ||/2|| = 0(1) and consequently Ep 

0{n"^). The first term of the right side of the inequality is dealt with as in (18), noticing 
that llYfcll <Cfor all fe: 



En 



2 

— ^afc^fc = 7^ E Epiaiak) {Yk,Yi\ 

k&U k/&U 



< 



A2 

^ Y, \Ep{a,ak)\\\Yk\\m\ 
k,e.&u 

n 



A2 
O 



To complete the proof, let us introduce the operator Zk = Yk <^Yk and remark that 



r-r = 



A 



V] akZk + ( — - T7 ) E — ^k + ^(8)/U-/U(g)/I 

f^T VA NJf-^TTk 

k£U k£s 
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By assumption (Al), we have that |(Z^.,Z£)2| < ll^fcll ll^ll ^ C*^, for all k and I and we 
get with similar arguments as above that 



En 



2 

kau 2 



and ii^n 



/^fces 



— Zfc = 0(n^^). Remarking now that 



l/X®// - /2(8l/I||2 < ||(/X - /I) (g) ;L(||2 + ll/i® (/X - /I)||. 



the result is proved. 

Consistency of the eigenelements is an immediate consequence of classical properties of 
the eigenelements of covariance operators. The eigenvalues (see e.g. Dauxois et al, 1982) 



satisfy \Xj 



A,l< 



VjW <C6j 



r-r 
-r 



. On the other hand, Lemma 4.3 by Bosq (2000) tells us that 



2\/2 



max 



where 5i = 2\/2(Ai — A2) "^ and for j > 2, 

[{\j-i - Xjy , {Xj - Xj+i)~ ] . 



This concludes the proof. 



(19) 

D 



Proof of proposition 3.2 



Considering first the mean curve /i, we get directly 



^(M + e6y) 

so that 

Ifi{M,Yk) 



N 



\i&u / 



l-^+j^{y-f^) + o{e), 



1 

N 



(n-/i). 



Let us first note that perturbation theory (Kato, 1966, Chatelin 1983) allows us to get the 
influence function of the eigenelements provided the influence function of the covariance 
operator is known. Indeed, let us consider the following expansion of T according to some 
operator Li, 



r(e) = r + eri + o(e), 
we get from perturbation theory that the eigenvalues satisfy 
Xj{e) = Xj + eti{TiPj) + o{e), 



(20) 



(21) 



where Pn 



Vil 



')Vj is the projection onto the space spanned by Vj and the trace of an operator 



A defined on L [0, 1] is defined by tr(A) = ^ (Aej, Cj) for any orthonormal basis Cj, j > 1 
of L^[0,1]. There exists a similar result for the eigenfunctions which states, provided e is 
small enough and for simplicity that the non null eigenvalues are distinct, that 

Vj{e) = v,+e{SjTi{vj)) + o{e), (22) 

where operator Sj is defined on L^[0, 1] as follows 

Vi Vl 



Si 



i^j ^^ 



Xi 
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So going back to the notion of influence function, if we get an expression for Fi in our 
case, we will be able to derive the influence function for the eigenelements. The influence 
function of T can be computed directly using the definition, 

r(e) = r{M + e5y) 

1 



N + e 



Y^ {Yi Ye) + e{y y) 



\eeu 



1 

(N + e) 



;{Nfi + €y)<S){Nfi + ey) 



r + -^ (y y - ^ ^ - r) - ^ (/x (y - /x) + (y - ^) ® /x) + o(e) 
r + ^{{y-fi)0{y-fi)-r) + o(e) 



(23) 



so that 

ir{M,Yk) 



1 

N 



{{Yk -fi)CS {Yk -fi)-T). 



The combination of (21) and ( |23[ ) give us the influence function of the jth eigenvalue 
IXj{M,Yk) = ^{{Yk-fi,vjf-Xj) 
as well as the influence function of the j'th eigenfunction (since {vj,vi) = when j ^ i) 

1 sr- O^k- fJ-, Vj){Yk - /x, vi) 
M Z^ ^ ^ ^^ 



Ivj{M,Yk) 



N 



Jy^j 



Xj — X£ 



n 



Proof of proposition 3.3 

Let us begin with the mean function. The remainder term is defined as follows 



R^ = ll-fi- / If^{M, Y)d{M - M) 



and 



Ri, 



^-/x 



/J 1 






Yk- fJ- 



kGs 



+ /^ 



N 



= Op(n-i/2), 

since fi — Ji = Op{n^^'^) and (A^ — N)/N = Op{n~^i'^) by proposition 
For the covariance operator, we have 

Rv = r-r-l^-((n-/x)55(yfc-/x)-r) 



3.1 



N 
N 



r-r 



fees 

M + ^ - ^ E ;r ^^' " ''^ ® ^^'^ " ^^ 

/ fces '' 



N 
N 



N 
N 



((m -V■)^{^J'- /^)) 



Op(n-i/2), 



(24) 
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noticing that 



Z^ — - — = i;? (r + /i®/x) . 



kGs 



N 



To study the remainder terms for the eigenelements, we need to go back to the perturbation 
theory and equations (20), (21) and (22). According to (24), with e = n~^'^, we can write 






(25) 



Introducing now ( |25[ ) in equation (21), we get noting that {RrVj,Vj) = Op{n ^'^) 
4 E - (<^^- - /^' ^i)' - ^r^. , ^i» + Op(n- V2) 



^i ~ ^3 






/Aj(M, y)d(M - M) + Op(n-i/2) 



which proves that R\^ = Op{n ^/^). Using now (22) and since SjRrVj = Op{n ^/^), we can 
check with similar arguments that 



Vj - ^j 



V fces '' / 






Aj - Xf 



Ivj{M, Y)d{M -M) + Op{n-'^/^) 
and the proof is complete. 
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Proof of proposition 3.4 



We prove the result for functional linearized variables Uk- For real valued linearized 
variables, for instance for an eigenvalue Aj, the proof is similar replacing the tensor product 
with usual product and the norm || • ||2 with the absolue value | • |. Let us denote by 

Mr{T{M))=yy—-®-=yy—-®-hh 

^ ^ TTkl TTfc TT/ ^^ '^kl VTfc TT/ 

and by 

A = AV{T(M)) - AV{T{M)) and B 

It is clear that 

AV{T{M))-Vp{T(M)) < A + B. 

Let us consider 



AV{T{M)) - Vp{T{M)) 



Ep {A') = E E ^kAk'i'Ep (l - M^ fi 

h icrr hi I'cTT ^ kl / \ 



h'lu \ / Uk ^ Ul Uk' ^ Uli 

(g) 



k,ieu k',i'eu 



T^k'V J XT^k TT/ TTfc/ TT;/ 
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Using the fact that ||tifc "8 U/II2 ^ ll'Wfcllll^dl ^^^ since it is easy to check that ||tifc|| < CN^^ 
where C is a constant that does not depends on fc, we get, under assumptions (A2), (A3) 
and (A4), with a similar decomposition as in Breidt and Opsomer (2000, proof of Th. 3) 



that Ep [A^] 



oln 



and thus Ep{A) = o{n '^). 



Let us study now the second term B and examine separately the case of the mean 
function and the eigenvalues and the case of the eigenfunctions which can not be dealt with 
the same way. We can prove, under assumptions (A2) and (A3), with similar manipulations 
as before that there exist some positive constant C2, C3, C4 and C5 such that 



Ep[B) 



En 



EE— ^^'^^ 

keu leu '^' 



Uk 
T^k 



ui_ 



< 



< 



< 



keu leu 



< 




Uk 


„ ui\ 




09 — 1 




Uk ill 




TTfe TTl 


2 


Ep 


Uk ^ Ul 

— (8) — 



[Uk - Uk) ^Ul -Uk^ {Ul - Ul 
{Uk -Uk) ®Uk -Uk®{Uk -Uk)\\lj\ 

EE(^pii"^ 




1/2 



Uk\\ \\ui\\ + Ep 



\ui 



2,,^ „2\V2 
Ul\\ \\Uk\ 



keU l^k 



-C^E 



keu 



p [\\Uk -Uk\ 



I l|2 , 11^ ||2 
\Uk\\ +\\Uk\\ 



1/2 



For k ^ I we have with assumption (A3) that A|^ < CN ^. Furthermore, since n < 
N < n/X, the estimated linearized variables for the mean function satisfy \\uk\\ = 0{n~'^) 
uniformly in k as well as for the eigenvalues (2^)^ = 0{n^'^). 



For the mean function /i we have 



Uk-Uk = -{^i-^i) + j^^ 



N 



and thus we easily get that Ep\\uk 
eigenvalues, we have 



Uk\ 



{Yk - fi) 
0{N^'^) uniformly in k. Considering the 



Uk -Uk 



1 

N 



{{Yk-fi,v,f-{Yk 



fi,Vjf + Xj 



A,) 



1 N- N 



{{Yk- ft,Vjf - Xj). 



N N 

After some manipulations we also get that Ep{uk — Uk)"^ = 0{N~^) uniformly in k. Com- 
bining the previous results we get Ep{B) = o{n~^) and the result is proved. 

The technique is different for the eigenfunctions vi,. . . ^Vq because we cannot bound 
easily terms like Ep{Xj — Xj+i)^^ which appear in the estimators of the linearized variables. 
By the Cauchy Schwarz inequality we have 

1/2 / \ 1/2 



B < lyy (^MlnIiV 



.keU l^k 



Uk (gi-u/l 



(26) 
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+ X] ( 2") y^ Wukf^Uk -Uk(^Uk\\l 



1/2 



Kk&U ■^^'^'^fc/ / \k(^u 

By assumptions (A2) and (A3) we have, for k ^ I, 



(27) 



eJ^sMlX 



A 



^p. 



T^kl'^k'^l J 



T^klT^k'^l 



kl ^ ^ 

2^2 - ^2- 



for some constant Cg that does not depend on k and /. When k = 1,'we have £'p I ''^ '' ] < 
Cj. Thus, by Markov inequahty we have 

1/2 

= Op(l), 






and 



E 



^kkh 



ykdU ^ ^'^ 



1/2 



Op(V^). 



Considering the terms containing hnearized variables in (26) and (27), we have the 
general inequality 

Y^ Y^ II ^ ^ ^^l|2 ^ o Y^ Y^ II ^ l|2 II ||2 I 11^ l|2 11^ ||2 

2_^ l^\\Uk ® Ui - Uk ® UiW^ < 2 2^2^ \\Uk-Uk\\ \\Ul\\ +\\UI-Ul\\ \\Uk\\ . 

k&U l^k k(^U l^k 



Let us make now the following decomposition 

n.r n fN-N\ 1 

\\uk-Uk\\ < \\Nuk\\ I ^ + — 

\ NN N 



Nuk - Nuk 



(28) 



with 



Nuk - Nuk = {Yk - /i, Vj) ^ ^ _ ' ^ f £ - {Yk - A, Vj) ^ 



^T^i 



Xn — Xf 



if^j 



{Yk - fi, ve) . 

— — ve. 

Xj — Xe 



It is clear that ||A^Ufc|| = 0(1) uniformly in k and 



N-N 
NN 



Op{n ^/^). We have for the 



second right hand term of inequality (28) 
Nuk - Nuk 



< \{Yk- /x, Vj) - {Yk - fi, Vj) 



E 



{Yk - /i, vi) 



Xj — Xp 



V£ 



+ \{Yk-fi,Vj)\ 






{Yk - /i, ve) 
Xj — Xe 



Ve 



E 



{Yk - fi, ve) 



e^j ^j ~ ^f- 



-Ve 



(29) 



It is clear that the first term at the right hand side of previous inequality satisfies, uniformly 
in /c, 

2 

1" 



Ep I {Yk - /i, Vj) - {Yk - A, Vj) 



E{Yk - /", Ve) 
—. ^ VI 



W 



Xj — Xe 



O 



n 
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Let us introduce the random variable 



T = min(Aj — A^+i, Aj_i — Xj) min(Aj — A 



j+i, Aj_i 



A,), 



the eigenvalues being distinct, we have with Proposition 1 that i = Op{l). As far as the 



second term in ( 29 ) is concerned we can write 



E 

4|a.-A 



{Xj - Xg){Yk - /i, Vi)vi - {Xj - Xi){Yk - A, ve)ve 



{Xj — Xe){Xj — Xi) 



< 



2^2 






H,vi) + 



2^2 



E(^^ 



\i?{Yk 



fJ',ve) 



+4X] 



E 



{Yk - fi, ve)vi 






(A,-A,)(A,-A,) 



(30) 



and thus the first two terms in (30) are 



We have seen that sup^ jA^ — A^p < 

Op(n~^). The assumption that F is a finite rank operator is needed to deal with the last 

term of (30). Using the fact that ||u£ — u^H < Cbj F — F where bj is defined in (19), we 

also get that this last term is also Op{n~^\ Combining all these results we finally get that, 
uniformly in k 

Ikfc -Uk\ = Op(n~^/2). 

It can be checked easily, under the finite rank assumption of V that, uniformly in fc, 
ll^fcll = Op{n~^) and \u\~\ = 0{n~^\ Going back now to (26) and (27) we get that 
B = Op(l)Op(n-3/2) + Op{n^''^)Op{n-'^) = Op{n-^). This concludes the proof. □ 
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